\section{Introduction}

The \commonlisp{} \emph{sequence} functions are defined to work on
lists as well as vectors.  Furthermore, many of these sequence
functions accept a keyword argument \texttt{from-end} that alters the
behavior, in that elements toward the end of the sequence are favored
over elements toward the beginning of the sequence.  Other functions,
in particular \texttt{reduce}, also accept this keyword argument.

Most sequence functions are not required to process the elements from
the end of the sequence even though the value of the \texttt{from-end}
keyword argument is \emph{true}.  For example, it is allowed for
\texttt{find} to compare elements from the beginning of the sequence
and return the \emph{last} element that \emph{satisfies the test}%
\footnote{The phrase \emph{satisfy the test} has a precise meaning in
  the \commonlisp{} standard as shown in section 17.2 in that
  document.}  even if the test has side effects.  There is one
exception, however: The \texttt{count} function is required by the
standard to test the elements from the end of the sequence.%
\footnote{Though if the test has no side effects and cannot fail, as
  is the case of functions such as \texttt{eq} or \texttt{eql},
  testing from the beginning is arguably conforming behavior.}
In addition to the function \texttt{count}, the function
\texttt{reduce} also requires processing from the end of the list when
\texttt{from-end} is \emph{true}.

Processing list elements from the beginning to the end could, however,
have a significant additional cost associated with it when processing
from the end would require fewer executions of the test function, and
the additional cost increases with the complexity of the test.

In this paper, we will concentrate on the functions that are required
by the standard to process list elements from the end, and we will use
only the function \texttt{count} in our test cases.

There are of course some very simple techniques for processing elements
from the end of a list.
One such technique would be to start by reversing the list%
\footnote{By \emph{reversing the list} we do not mean modifying the
  list as \texttt{nreverse} would do, but creating a new list with the
  elements in reverse order as \texttt{reverse} would do.  The reason
  for excluding modifications to the list is that doing so might
  influence the semantics of other functions, including perhaps the
  test function or the view of the list by other threads.}  and
processing the elements from the beginning in the reversed list.  This
technique is used by several implementations, including \sbcl{},
\ccl{}, and \lispworks{}.  A major disadvantage of this technique is
that it requires $O(n)$ additional heap space, and that it requires
additional execution time by the memory allocator and the garbage
collector.

Another simple technique would be to traverse the list
\emph{recursively} and testing the elements during the
\emph{backtracking} phase of the recursion.%
\footnote{Despite considerable research, we have not been able to find
  any original reference to this technique, and it seems too trivial
  for standard text books to even discuss.  We must conclude that this
  technique must be so obvious that it was probably discovered
  independenlty by several people.}
Again, $O(n)$ extra space
is required, even though this time it is \emph{stack space} rather
than heap space, so that the memory allocator and the garbage
collector are not solicited, at least in most implementations.  Worse,
many implementations have a fairly small call stack, in particular in
multi-threaded implementations where each thread must have a dedicated
stack.  Aside from these disadvantages, this technique is however
fairly efficient in terms of execution time, because a simple function
call is quite fast on most modern processors.  For that reason, we
will use recursion as the basis of the technique described in this
paper, but with fairly few recursive calls so that the additional
extra space is modest.

Throughout this paper, we assume that the lists to be processed have a
large number of elements, for several reasons:

\begin{itemize}
\item We do not want the list to be small enough to fit in the cache,
  because cache performance depends on other workloads as well.
\item For short lists, performance may be dominated by the overhead of
  calling a few functions, or by loop prologues and epilogues.  By
  using long lists, we make sure that performance is dominated by
  traversing the list and computing the test.
\item If the list is too short, it can be processed by a simple
  recursive technique.  In order to avoid this possibility, we want
  the lists to have orders of magnitude more elements than the size of
  the stack.
\end{itemize}

Furthermore, throughout this paper, we will assume that the
\emph{test} to be performed on the elements of the list is the
function \texttt{eq}.  By making this assumption, we expose the worst
case for our technique, because the execution time will then be
dominated by the overhead of traversing the list, as opposed to by
executing the test function.

It should be noted that the difficulty of processing list elements in
reverse order is due to the way \commonlisp{} practically imposes the
representations of such lists.  Other representations are, of course,
possible.  For instance, Hughes \cite{Hughes:1986} suggested a
representation of lists as first-class functions.  Similarly, in his
talk on parallelism in 2009,%
\footnote{http://groups.csail.mit.edu/mac/users/gjs/\\
6.945/readings/MITApril2009Steele.pdf}
Guy Steele proposed a representation of lists for parallel processing,
based on using the four operations \texttt{item}, \texttt{split},
\texttt{list}, and \texttt{conc}.  Clearly, such alternative
representations could be devised that facilitate processing elements
in reverse order.

In this paper, we use the international convention \cite{ISO80000} for
writing logarithms.  Hence, we write $\mathsf{lb}\thinspace n$ for the
logarithm in base~$2$.  We use $\mathsf{log}$ only when the base is
unimportant.
Given a real number $n$, the notation
$\lfloor n \rfloor$ represents the \emph{floor} of $n$ 
and $\lceil n \rceil$ \emph{ceiling}. For example, $\lfloor 1.5 \rfloor = 1$
and $\lceil 1.5 \rceil = 2$. 
%%  LocalWords:  startup runtime allocator
